Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix get tags #68

Merged
merged 6 commits into from
Feb 4, 2023
Merged

Fix get tags #68

merged 6 commits into from
Feb 4, 2023

Conversation

wolfv
Copy link
Contributor

@wolfv wolfv commented Feb 3, 2023

Even when asking for N tags, it doesn't mean that the (Github) registry returns that many tags. In my experience we need to use pagination, but I can't judge if this works for registries other than ghcr.io (I would think so though).

This also changes the API and makes it conform to the types indicated (return List[str]), and not a response object.

I tried to run the pre-commit linters but they are currently being pulled from a local repository? Wouldn't it be better to use the upstream ones? E.g.: https://github.com/mamba-org/mamba/blob/9de6782e02259e0d9050a1f74ac47383ff0d65f0/.pre-commit-config.yaml#L29-L43

@wolfv
Copy link
Contributor Author

wolfv commented Feb 3, 2023

PS: this can be tested by asking for tags e.g. for this package: https://github.com/orgs/channel-mirrors/packages/container/package/conda-forge%2Flinux-aarch64%2Farrow-cpp

@vsoch
Copy link
Contributor

vsoch commented Feb 3, 2023

Even when asking for N tags, it doesn't mean that the (Github) registry returns that many tags. In my experience we need to use pagination, but I can't judge if this works for registries other than ghcr.io (I would think so though).

I'm not sure, actually. I didn't think we'd hit the use case of wanting more than the default (is it 50?) but here we are! 😆

This also changes the API and makes it conform to the types indicated (return List[str]), and not a response object.

👍

I tried to run the pre-commit linters but they are currently being pulled from a local repository?

There are different ways to go about this, but I like to install the deps directly to the same environment, so you need to do pip install -r .github/dev-requirements.txt first. There are instructions for that here: https://oras-project.github.io/oras-py/getting_started/developer-guide.html#code-linting and I'll show a screenshot below:

image

Wouldn't it be better to use the upstream ones? E.g.: https://github.com/mamba-org/mamba/blob/9de6782e02259e0d9050a1f74ac47383ff0d65f0/.pre-commit-config.yaml#L29-L43

Maybe? "better" is relative I think - I might just like the better control / grouping with the same deps in my local environment.

@vsoch
Copy link
Contributor

vsoch commented Feb 3, 2023

@wolfv the DCO is unfortunately required by the upstream projects (under CNCF) and out of my control - let me know if you have any issues I can help with around that.

oras/provider.py Outdated Show resolved Hide resolved
@wolfv wolfv force-pushed the fix_get_tags branch 2 times, most recently from b713ef4 to 09d865e Compare February 3, 2023 19:09
@wolfv
Copy link
Contributor Author

wolfv commented Feb 3, 2023

Hmm, looks like I might've received a newer black?

@vsoch
Copy link
Contributor

vsoch commented Feb 3, 2023

Hmm, looks like I might've received a newer black?

Definitely possibly - we don't pin the version, and that is living a bit dangerously. It means if a newer version is grabbed in the CI, I will just update my local environment to use it. E.g., pip install black==23.1.0. Sorry :/

@wolfv
Copy link
Contributor Author

wolfv commented Feb 3, 2023

There are a couple unrelated changes now (some empty lines deleted). I don't care but if you do, I can revert them :)

add raise_for_status and test getting many tags from ghcr.io

Signed-off-by: Wolf Vollprecht <[email protected]>
@vsoch
Copy link
Contributor

vsoch commented Feb 3, 2023

This is looking great! I'm in the middle of a work day and won't be able to review until after (when you are sleeping!) but I'll definitely have time this evening or this weekend to take a closer look, test stuffs locally, etc. With your permission, I can update the PR with any changes or do a PR to your branch if that is preferred. Thanks for doing this fix!

this extends the updated get tags function (with pagination!) to
use a general function, so a future caller can use the same functionality.

Signed-off-by: vsoch <[email protected]>
@wolfv
Copy link
Contributor Author

wolfv commented Feb 4, 2023

Thanks for your changes. Nice find on the requests.Response object, I also (obviously) didn't know about. I think it's much nicer to use it.

I made the pagination function a bit more generic by taking a callable to extract tags (or whatever). lmk what you think. To me, this looks good and works :)

Copy link
Contributor

@vsoch vsoch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like these changes! If we have another case that doesn't warrant a callable we can make it optional, but I'm good leaving as is until that's proven. I also really like the response.links - I am glad you added it back.

The one question I have is about providing -1 for all - I think that's something we could honor here, but for the actual query I don't think I see it in the spec? https://github.com/opencontainers/distribution-spec/blob/067a0f5b0e256583bb9a088f72cba85ed043d1d2/spec.md?plain=1#L471-L513 So my suggestions (and let me know what you think!)

  • have -1 be the default to retrieve all tags, but don't pass to the API endpoint.
  • remove "query" from the tags_url property

After we discuss those things I think we should be good to go!

oras/provider.py Outdated Show resolved Hide resolved
oras/container.py Outdated Show resolved Hide resolved
Signed-off-by: Wolf Vollprecht <[email protected]>
@wolfv
Copy link
Contributor Author

wolfv commented Feb 4, 2023

I changed it to use optional[int] for the N parameter. lmk what you think about that. I'll be afk for the rest of the night so feel free to take over and finish this PR if you want :)

Unfortunately mypy didn't accept somethign like

if not N:
   N = 10_000

which is why I had to add the temporary variable (mypy issue here: python/mypy#4805

@vsoch
Copy link
Contributor

vsoch commented Feb 4, 2023

yep let me figure something out! And I'll get this merged / released asap after so we can test with the mirror.

@vsoch vsoch merged commit 7e2a092 into oras-project:main Feb 4, 2023
@wolfv
Copy link
Contributor Author

wolfv commented Feb 4, 2023

Uh oh now get all is limited to 10k tags :)

@wolfv
Copy link
Contributor Author

wolfv commented Feb 4, 2023

If I read the code correctly on mobile, we're now cutting the tags to N and N is set to 10k, even if we intend to retrieve all tags.

@vsoch
Copy link
Contributor

vsoch commented Feb 4, 2023

The logic hasn't changed - I think the bug we missed is that we don't know what to set the N parameter to in order to ask for all of them - should it be unset?

For our function - this should handle asking for more at least:

bool(len(new_tags) and (retrieve_all or len(tags) < N))

retrieve_all True would set wants_more to True. As for the N parameter in the query, you are right if that is going to set a limit we have a bug! I suppose it should be unset if we want all? 🤔

@vsoch
Copy link
Contributor

vsoch commented Feb 4, 2023

okay looks like when it's unset it just returns a value of N as 0. I'm thinking if the value is -1 we can just set to None (so no N parameter required). I brought up a kubernetes cluster I need to work on for a bit but I'll do a suggested fix after that.

# When no query params are set:
response.links
Out[8]: 
{'next': {'url': '/v2/channel-mirrors/conda-forge/linux-aarch64/arrow-cpp/tags/list?last=1.0.1-py37h64c39f3_7_cpu&n=0',
  'rel': 'next'}}

@vsoch
Copy link
Contributor

vsoch commented Feb 4, 2023

This is a really good example of how mypy has influenced the actual design of the library to avoid having None as a default, and as a result has made this simple interaction more challenging. I'm not sure I'm a fan of these typing tools - beyond running the check they don't actually enforce anything!

@wolfv
Copy link
Contributor Author

wolfv commented Feb 4, 2023

Didn't you add a tags[:N] at the end of the function? That's what would cut everything beyond 10k for the retrieve all case if I read correctly

@vsoch
Copy link
Contributor

vsoch commented Feb 4, 2023

Oh that too. But if we ask the api (N) to go up to a certain point it’s not clear if it will stop paginating to. The fix is to allow the parameter to be None like we originally wanted I think and then not to try and set a default.

@vsoch
Copy link
Contributor

vsoch commented Feb 4, 2023

See #70

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants